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Abstract 

Sophistication and logical depth are two measures 
that express how complicated the structure in a string 
is. Sophistication is defined as the minimal complex- 
ity of a computable function that defines a two-part 
description for the string that is shortest within some 
precision; the second can be defined as the minimal 
computation time of a program that is shortest within 
some precision. We show that the Busy Beaver func- 
tion of the sophistication of a string exceeds its logical 
depth with logarithmically bigger precision, and that 
logical depth exceeds the Busy Beaver function of 
sophistication with logarithmically bigger precision. 
We also show that this is not true if the precision is 
only increased by a constant (when the notions are 
defined with plain Kolmogorov complexity). Finally 
we show that sophistication is unstable in its preci- 
sion: constant variations can change its value by a 
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linear term in the length of the string. 

1 Introduction 

SolomonofF [25 , Kolmogorov [14 and Chaitin [5| in- 
dependently defined a measure of information con- 
tained in a bit string x as the length of a shortest 
program that produces a; on a universal Turing ma- 
chine. This measure, usually represented by C{x), 
is called Kolmogorov complexity. Kolmogorov com- 
plexity does not express whether the string contains 
sophisticated structure. For example, consider for 
some n a randomly generated n-bit string. With 
high probability the complexity is about n and the 
string has no (complicated) structure. On the other 
hand, the (2" — l)-bitstring representing the Halt- 
ing problem for programs of length less than n has 
also complexity close to n but has very complicated 
structure. Informally, "sophistication of structure" 
can be measured by the computation time of a pro- 
gram modelling the structure or by the minimal size 
of a program that models the structure. 

The first notion is Bennett's logical depth [4\ At 
significance level c, it is defined as the minimal time 
to compute a; by a program p that is c-incompressible 
on a universal prefix-free Turing machine (of some 
type), i.e. C{p) > \p\ — c. Bennett [4j showed that 
this measure is closely related to the minimal time 
for which some time-bounded version of algorithmic 
probability converges within a factor 2^'^. We will use 
the following simpler variant (which is closely related, 
see further): 

"the time required to compute x by a pro- 
gram no more than c bits longer than a 
shortest program". 

Examples of strings that are non-deep accordingly 
to this definition are the random strings and the (effi- 
ciently) computable ones. In [2,, this notion was used 
to show that if the complexity class NP reduces to a 
sequence for which every initial segment is not deep. 



up to "polylog" precision in the length of the string, 
then the polynomial time hierarchy collapses. In par- 
ticular, it would imply a collapse if NP reduces to a 
sparse or to a random set. 

Koppel [TT defined a different notion of depth 
for infinite sequences based on monotone complexity. 
The class of deep sequences is defined by the ones for 
which the depth of initial segments is not bounded 
by a computable function of their length. In partic- 
ular, the set of such sequences is disjoint from the 
set of random ones, and hence, they define a set of 
measure zero. Lutz [13] showed that deep sequences 
contain useful information in the following computa- 
tional sense: the class of sequences that true-table 
reduces to them has non-zero measure in the class of 
computable sequences. 

Kolmogorov [15l [16] defined for each string the no- 
tion of structure function dividing a shortest program 
for a string in two parts - one part accounting for 
useful regularities and another accounting for the re- 
maining information presented in the string - in such 
a way that this two-part description is as small as 
the shortest one-part description. He represented the 
regularities in the string, by finite sets. Later, Kop- 
pel [171 nil [E] expressed regularities as monotone 
computable functions and called the minimal com- 
plexity of the function defining a shortest two-part 
code sophistication. Following Koppel's work, Li and 
Vitanyi [21] p. 100] and independently Antunes and 
Fortnow [1] revisited the notion of sophistication con- 
sidering computable functions (that are not necessar- 
ily monotone) . It was observed that there are strings 
with near maximum sophistication, and such strings 
encode the halting problem for smaller programs (see 
e.x. [1]). Furthermore, in [I] the authors introduced 
the notion of coarse sophistication, and showed that 
it is roughly equivalent to a variation of computa- 
tional depth based on the Busy Beaver function. In 
Section [2] we present a more detailed overview of the 
literature on these sophistication measures. 

Sophistication and logical depth are conceptually 
very different since the former measures program 
lengths while the latter running times. In order to 
establish a relationship between these measures, we 
rescale logical depth from running time to program 
length using the Busy Beaver function. In this sce- 
nario, we prove that up to logarithmic changes of the 
significance of both measures, they are equal up to 
logarithmic terms. From this, we conclude that all so- 
phistication measures defined using Kolmogorov com- 
plexity, are equivalent in this sense. Then we prove 



that such an equivalence of sophistication and logical 
depth is not possible up to constant changes in signif- 
icance. Finally, we study the stability of sophistica- 
tion under changes of significance. From [27l Theo- 
rem IV. 4], one concludes that a logarithmic change of 
the significance can change sophistication maximally 
(i.e. almost |a;|). We show this also holds for constant 
changes of the significance. 

The rest of the paper is organized as follows: in the 
next section we overview several variants of sophis- 
tication. In Section [3] we introduce some definitions 
used in the rest of the paper. In Sections [4] and [5] we 
present and prove the new results mentioned in the 
previous paragraph. 



2 Structure function, sophisti- 
cation and variants 

Kolmogorov [15l [16], see [7], raised the question 
whether there exist a string x for which no finite set 
of small Kolmogorov complexitj]^ exists in which it 
is a "typical element". He defined x to be c-typical 
in S" 9 X iff log IS" I — C{x\S) < c, i.e., a literal rep- 
resentation of the lexicographic index of x in S* (of 
length log jS"! -(- 0(1)) is almost a shortest description 
for X given S. Note that by a counting argument, at 
most a fraction 2^^+^ of elements in the set can be 
non-typical [j 

In [23l [ini [To] it was shown that absolutely non- 
stochastic objects exist, i.e., some strings are only 
typical in sets of complexity close to the length of the 
string. Such strings have high mutual information 
with the Halting sequence and with high probability 
they do not appear in statistical experiments. 

Kolmogorov also considered a more restrictive class 
of good set-models for a string x. To understand this 
criterion, consider the structure set, which is the set 
of all pairs {i, j) for which there is an ^-containing set 
of complexity at most i and cardinality at most 2-' , 
see Figure [1] For x of length n, the set contains the 
points (0,n) and {C{x),0) witnessed by the sets {x} 
and the set of all n-bit strings (we ignore O(logn)- 
terms). Note that if the set contains (z,j), then it 
also contains all pairs (z + k,j — k) for fc < j[3 Hence 



^ The Kolmogorov complexity of a set S is the length of a 
shortest program that print all elements of the set and halts. 

^ The converse is also true: any set has non-typical elements 
unless the set contains a lot of mutual information with the 
Halting problem y . 

^ Partition the set in subsets of size at most 2^"*^, this 
increases the complexity of the x-containing set by at most k. 
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Figure 1: Structure function 

the lower border of the set, called structure function, 
is decreasing (still ignoring O(logn) terms). No point 
appears below the line i+j = C(x), otherwise the cor- 
responding set could be used to construct a program 
for X of size less then C{x). Cover [6j (see also 
Sect. 14.12] and [2T1 Sect 5.5.1]) mentioned explic- 
itly the left-most place where the set approaches this 
line, which wc will call set sophistication of x: 

soph,(a;) = min{C(S') : x G S'AC(S')+log |5| < C(x)4 

It is not hard to show that if S is c-sufficient for x (i.e. 
satisfies the conditions in Definition [T] of soph^(a;)), 
then a; is c + 0(log |2:|)-typical in S, where |a;| de- 
notes the length of x. In [27. it was shown that if 
there exist a set S in which x is c-typical, then there 
exists a set of complexity C{S) + 0(log |x|) which is 
(c -t- 0(log |a;|))-sufiicient for x. Hence, if sophistica- 
tion was defined in terms of c-typical sets, then we 
would obtain a function that is 0(log |a;|)-close; here 
two non-increasing functions / and g are said to be e- 
close for some natural number e if f{c + e) < g(c) -\- e 
and g{c + e) < f{c) + e for all c. From [571 Theo- 
rem IV. 4] it follows that every decreasing function / 
is (C(/) -I- 0(log |x|))-close to the structure function 
of some string of length /(O), which implies that ev- 
ery non-increasing function with f{k) = is close to 
the sophistication of some string of length /(O) -I- fc 
and complexity k (in the interval [0,/c])|l| This also 
implies that sophistication is unstable, increasing its 
parameter c by 0(log |a;|), the sophistication can drop 
from its maximal value |a;| — 0(log |a;|) to 0(log \x\). 
In [To] it is shown that a sufficient set of almost 
minimal complexity of a string x can be computed 
from an initial segment of the binary code of the 



number of halting programs of length C{x). Hence, 
such a set contains high mutual information with the 
Halting problem for short programs. (See |24j for the 
proofs of this and the above results) . In [3S1 [55] it is 
concluded that this statistic can hardly be interpreted 
as a denoised version of a;. In fact, compared to x, a 
sufficient two part-code (5, z) (where z is the lexico- 
graphic index of x in S) can contain different compu- 
tational information from x, although C{x\S^z) and 
C[S, z\x) are small. The proposed solution is to im- 
pose the existence of a computable bijection of small 
complexity between x and {S^z). This is equivalent 
to the requirement that there exist a short total pro- 
gram computing S from x. In [5S] it was shown that 
this leads to a notion of sophistication that is different 
from set-sophistication. 

By varying the model type and the type of Kol- 
mogorov complexity in the notion of minimal suffi- 
cient sets, many variants of sophistication measures 
have been introduced. We say that a function / 
is c-sufficient for x if there is a string d such that 
/(d) — X and C{f) + \d\ < C{x) -I- c. A computable 
^probability density function P is c-sufficient for x if 
C{P) + log(l/P(a;)) < C{x) + c lojl In an ana- 
logue way, probabilistic and function sophistication 
at significance level c are defined as the computable 
functions and probabilistic density functions of min- 
imal complexity that are c-sufficient. In [25] Lem- 
mas 7.1 and 7.2] it is proved that set, function, and 
probabilistic sophistication are all 0(log |a;|)-close. In 
order to generalize the notion of sophistication for 
(infinite) sequences, Koppel [TT] [T51 [T^] considered 
monotone computable functions / as models. The 
sufficiency criterion for the two-stage code for x is 
the existence of a string d such that f{d) = x and 
Km{f) + \d\ < H{x) -I- c where H{x) is the mini- 
mal length of a such two-part description for x on 
some special monotone machine and Km{x) denotes 
the monotone Kolmogorov complexity correspond- 
ing to the machine. It is not hard to show that 
H{x) = Km{x) + 0(log |a;|) and also that this notion 
of sophistication is 0(log |a;|)-close to the aforemen- 
tioned notionsjfj On this model, Koppel defined so- 
phistication and depth for sequences in two variants, 
and for each variant he showed that sophistication 



^ This result is stated in terms of the MDL function Xx{c), 
and by unfolding the definitions, one can observe that for 
fixed X, the inverse of Xx{c) — C{x) equals set sophistication 
soph^ix). 



^ This probabilistic sufficiency criterion was defined in |10| 
in terms of prefix-free complexity, because 2~^(^l^) defines a 
probability distribution and hence, it is natural to compare it 
with P{x). Prefix complexity and plain complexity differ by 
at most O (log I a; I) 1211 . and this precision is sufficient for our 
discussion. 

^ It is unclear whether H{x) = Km{x) + 0(1). 



and depth are equal up to 0(l)-constants. 

A last variant of sophistication is called effective 
complexity [11] [T^]- This notion uses a probabil- 
ity density function P. Inspired by an information- 
theoretic solution of the problem of Maxwell's De- 
mon, total entropy of P has been defined as C{P) + 
H{P), where H{P) = ^^ P(x) log2(l/P(x)) denotes 
the Shannon entropy of Po A probability density 
function P is a c-good model for x if C{P) + H{P) < 
C(x) + c and log(l/P(a;)) < H{P) + C0 The c- 
effective complexity is the minimal complexity of a c- 
good model. In [22j Lemma 21], it is shown that effec- 
tive complexity is close to set-sophistication. Indeed, 
if P is c-good than it is 2c-su5icient. For the other 
direction, note that at most 1 + iJ(P)2^(^)+'=+i ele- 
ments satisfy log(l/P(a;)) < \H{P)~\ + c|f| and these 
elements can be computed given P and \H{x)~\ < 
C{x) + c < |a;| + c + 0(1). Hence a c-good model 
defines a (c -I- 0(log |a::|))-sufEcient set. In [HI Theo- 
rem 18] it is also shown that strings with high effec- 
tive complexity have very high computational depth. 
Moreover, the proof shows that effective complexity 
is upper bounded by the inverse Busy Beaver function 
of logical depth with slightly bigger significance. Our 
Theorem [6] implies also the other direction, i.e. that 
effective complexity is 0(log |x|)-close to the inverse 
Busy Beaver of logical depth. 

3 Preliminaries 

In this work x,y,p,q is used for strings and n,c 
for natural numbers. \x\ is the length of x. logn 
means [log2 n\ . We fix a reference Turing machine 
U that is universal in the following sense: for any 
other machine V , there is a string wy such that 
U{wvp) = V{p) if V{p) is defined. 

The Kolmogorov complexity of x is defined as 



C{x) 



min{|p| : U{p) 

V 



x}. 



■^The definition of total entropy used in [TT1IT2] is K{P) + 
H{P). Notice that plain and prefix complexity are sufficiently 
close {\K{P) - C{P)\ < 0(logC(P)). See also footnote [2] 

^ In fact, in |lll the precision for which these inequalities 
should hold is not discussed. Also, the authors suggest that 
the computation time of a program for P is bounded by some 
computable function. In 1221 the first requirement c = 5\x\ is 
chosen for some S > and in the second requirement a different 
parameter is chosen. Furthermore, P should be computable as 
a real function and no restrictions on the computation time 
are considered. Also, K{P) is replaced by K{P,H{P)). 

® Let N be the number of strings x such that P{x) > 
2~H(P)~'^, Remark that for at most 1 element we have 
log2(l/P(a;)) < 1. Hence H(P) > {N - l)J2^P{x) > 



Notice change of universal machine U affects Kol- 
mogorov complexity by not more than an additive 
constant term. 

Koppel [T7] , using monotone functions as a model, 
defined sophistication for infinite strings. Later, Li 
and Vitanyi [20 and Antunes and Fortnow ^Ij inde- 
pendently simplified Koppel's definition of sophisti- 
cation for finite strings, using computable functions 
(that are not necessarily monotone). 

Definition 1 (as in [T]). The sophistication of x with 
significance c (also called c- sophistication) is defined 



soph^{x) = min 
p 



U{p, d) is defined for all d 

\p\ : and there is a d s.t. U(p,d)=x 

and \p\ + \d\ < C{x) + c 



Bennett [4] defined the c-significant logical depth of 
an object x as the time required by the machine U to 
generate x with a program p that is c-incompressible 
{i.e. C{p) > \p\ — c). We use a more intuitive ver- 
sion of logical depth (also discussed in [4^), and argue 
at the end of this section that for our purposes this 
notion is sufficiently close to Bennett's definition. A 
string's logical depth at significance level c is 

depth^(a;) = min{time(p) : \p\ < C{x)+c and U(j)) — x}, 

where time(p) denotes the amount of computation 
steps made by U to reach a halting state. 

One can, scale down the running time to program 
length using the inverse Busy Beaver function 

bb{n) — min{|p| : U{p) halts and U{p) > n} . 

Based on this function one can consider the following 
variant of logical depth. 

Definition 2. The Busy Beaver logical depth of x 
with significance c is defined as: 

depth^u{x) — bh{depth^{x)) 

_ • Ji I . IpI < Cu{x) -I- c and U{p) — 
p,q 1 ' and time{p) < time{q) 

From the definition it is easy to see that 
depth^''(x) < C{x) < \x\ + 0(1). Recalling the 
definition of e-close functions, one can derive that 
the choice of another universal machine changes the 
depth function to an 0(l)-close function: 

Lemma 3. For all universal Turing machines U and 
V , there exist a constant c' such that for all c and x: 
depth^u(x) > |x| [no Busy Beaver here!] implies 

depth^^^, y{x) < depth^jj{x) + c' , 



Proof. Let wy be the prefix such that V{'Wvp) sim- 
ulates U{p) for all p. Our result would follow easily 
if we assume that for any halting programs p,q onU 
such that time(p) < time(g) we have time(wvp) < 
tunB{wvq) on V] i.e. simulating U on V pre- 
serves the order of computation time. Indeed, any 
pair (p, q) usable in the definition of depth on U de- 
fines a pair {wvP, wyq) that can be used in the def- 
inition of depth on V. The program wyp is minimal 
on V within c -|- \wv\ + \wi/\ error (where wu is the 
string that allows to simulate U on V). Hence, the 
pair (wvPjWvq) witnesses an increase of sophistica- 
tion by at most \wv \ for an increase of the significance 
of at most c + \wv\ + \wu\. 

In the case where the assumption is not true, we 
need to find a program of length at most | g | + O ( 1 ) on 
V that computes longer than time(wyp). Let us try 
the following algorithm: on input q it determines all 
programs p that have running time at most time((7) 
on [/, and subsequently determines for all these p's 
the maximal running time T of a program wyp on 
V, and finally prints a string of length T. On the in- 
put q, this algorithm produces an output longer than 
tinie{wvp), and by universality there is a program of 
length |g|-|-0(l) on V that print this string and hence 
computes longer than T. 

This reasoning has a bug, we assumed that only 
finitely many programs on U have halting time at 
most time(g). To fix this, we remark that for any x, 
we only need to consider pairs (p, q) on U such that 
\p\ < \x\ + 0(1) and |a;| < time(g), which implies 
\p\ < tim.e{q) + 0{1). Adding this condition to the 
considered set of programs makes it finite. D 

Recall that Bennett's definition of logical depth 
is the minimal computation time of a program on 
a prefix-free machine W (of some type) that is c- 
incompressible. We show that when scaled by in- 
verse Busy Beaver functions, both notions of depth 
are 0(log |a;|)-close. On a prefix-free machine W, 
both (unsealed) depths are closely related: Ben- 
nett's logical depth of x at significance c is at most 
depthj,_|_Q(iQgU|\ ^y(a;), because any c-shortest pro- 
gram p for X is c -|- 0(l)-incompressible on W. On 
the other hand, by [13l Lemma 5.3] (attributed to 
Bennett [3]), depth^^Q,i^\{x) is bounded by a com- 
putable function of Bennett's logical depth of x with 
significance c. Hence, after rescaling with the inverse 
Busy Beaver function, both notions are 0(l)-close. 
Exchanging prefix-free machine by a plain machine, 
both depth notions are 0(log |a;|)-close; indeed this 
follows by the same argument as Lemma[3]for W — V 



and replacing \wv\ by 0(log |x|)-terms in the proof 
(since \Kw{x) — Cu{x^\ < 0{\og \x\)). 

4 Sophistication and Busy 
Beaver logical depth are 
closed 

Koppel TT claimed the equivalence between logical 
depth and sophistication for infinite sequences. For 
such sequences, depth is defined as the minimal com- 
plexity of a total function rather than the complexity 
of a number. In this section we study the relationship 
between sophistication and Busy Beaver logical depth 
by showing that if we allow logarithmic terms in the 
significance levels these two measures are O(logn)- 
close. 

Theorem 4. The functions depth^ [x) and soph^{x) 
are 0{\og\x\)-close, i.e. for all c,x: 

depth^^{x) < sop/i^+o(iog«)(^) + <^(log"-) 
soph^{x) < depth^Xo{\oen)(.^)+0{\ogn). 

Here and below implicit O(-) constants do not de- 
pend c but might dependend on the reference ma- 
chine U . 

Proof. To prove the first inequality, note that we 
can assume that c + 21og(n) < n + 0(1) be- 
cause otherwise the printing program would witness 
depthJ;';_2iog„(a;) < 0(logT7.). 

Let (p, d) be a two part code with total program 
p witnessing soph^(a;). This code (p, d) naturally de- 
fines a one-part code pd where p is a self-delemiting 
code of length at most \p\ + 21ogri that evaluates 
U{p, d) (for large n). It remains to show that time(pd) 
is bounded by the computation time of a program of 
length \p\ + 0{logn). Indeed, the computation time 
is bounded by 

t = max {time(pe) : |e| < n} , 

and t can be computed from p and n; hence, a shortest 
program that prints t zeros satisfies the conditions. 

Now we prove the second inequality. For each fc, I 
such that / < k consider a sequence of strings and 
markers 



a;i,a:2, 



■ , ^z, I — ': ^i-^l 7 ■ • • 1 -^j 7 I — h '^i+l 1 



enumerated as follows: dovetail all programs of 
length I and fc, and enumerate their output in order 



of computation time. Each time a program of length 
I is enumerated, also append a marker to the series 
(if other programs with the same computation time 
appear, append the marker last). One easily observes 
that: 

i) the sequence is enumerated uniformly from k,l, 

ii) there are at most 2' markers, and at most 2*^ 
strings, 

iii) if a program of length k outputs x in time at most 
BB{1) = max{time(p) : \p\ < /}, then x appears 
in the sequence before its last marker. 

The second inequality follows from: 

Claim 5. Every string that appears before the last 
marker in a sequence satisfying properties [3 and FmI 
satisfies soph,^^c{x}+o{\ogk){x) < ' + 0(logfc). 

Indeed, notice that if I — depthj,(a;) then there is a 
program for x of length at most C{x) + c that runs 
in time at most BB{1). Also, if k is the length of this 
program then I < k < C{x)+c. Notice that x appears 
in the sequence before the last marker. By the claim, 
we have soph^^Qri^gf.\{x) < I + O(logfc). Again we 
can assume, without loss of generality that c log n < 
n + 0(1) since otherwise, soph^jQg„(a;) < 0{\ogn) 
and the theorem would follow trivially. This implies 
logfc < O(logn), and hence the second inequality. 

In order to complete the proof of Theorem |4] we 
now prove Claim [S] For any x as in the Claim, we 
need to show that there exist a total program p such 
that \p\ < l + 0{\ogk), and U{p,d) = x and |p| + |d| < 
k + 0(log k) for some d. 

With every segment of strings Xi+i, . . . ,Xj in the 
sequence, separated by two markers D, we associate 
a function / that maps the lexicographic first j ~ i 
strings to xt+i, . . . ,Xj and all other strings to the 
empty string. Notice that / is total, and can be 
computed from fc, I and the amount of markers that 
proceed the defining segment. Therefore there exists 
a program p for / such that \p\ < I + 0{logkl) — 
l + Oi\ogk). 

Now we only need to show that for any segments 
containing x, there exists also a total program p for 
/ such that U{p,d) = x and \p\ + \d\ < k + O(logfc) 
for some d. Assume that the logarithmic size of the 
segment is S. Observe that there are at most 2'^"'^ 
segments of size at least 2^ (by condition |ii| . Hence, 
there is a program p for / such that \p\ < k — S + 
0{\ogkl6). Since the segment contains x, there is a 
d such that f{d) — x, and by construction \d\ < 6. 



Hence \p\ + \d\ < {k - S) + S + 0{\ogkl5) < k + 
O(logfc). D 

One can ask if an equivalence between sophistica- 
tion and Busy Beaver logical depth is possible with 
constant precision in the significance levels. In the 
next theorem In Theorem [B] we give a negative an- 
swer to this question. 

Theorem 6. For all large I there exist infinitely 
many strings x such that depthi (x) > \x\ — 0{l) and 
sophgix) <0(P2'). 

Notice that soph^(x) is a non-increasing function 
when the significance level c increases and hence we 
can consider sophQ(a;). 

The difference between logical depth and sophis- 
tication is not so surprising; the first one is based 
on one part descriptions while the latter is based on 
two-part descriptions for x, i.e. the Turing machine 
computes x from two strings separated by a blank. 
The position of the blank can be encoded as a log- 
arithmic amount of information, and this sometimes 
allows us to make a two-part code shorter than the 
Kolmogorov complexity (see Lemma [T]) . A different 
question is whether an equivalence holds with 0(1)- 
precision when the notions are defined with prefix- 
free complexity. The answer is negative which fol- 
lows by O Proposition 3.2.2 p. 3-14]. The proof of 
Theorem ini uses the two subsequent lemmas. 

Lemma 7. For every string x and real k such that 

k + log fc < I a; I , we have 

soph\^\_c{x)~iogk+o{i)i^) < k + 0{l). 

Proof. Let a: be a string of length n. In order to 
prove the lemma it is sufficient to show that there is a 
two-part description (p, d) for x satisfying \p\ + \d\ < 
n ~\ogk + 0(1) and \p\ < k + 0(1). The idea to 
prove it, is to use the length of \p\ to encode the 
last log fc — 1 bits of x. Let i be the index of the last 
log fc — 1 bits of X in the lexicographic order of strings; 
(i.e., a;r„_iog j,^2:n] is the z-th string in the sequence 
£,0,1,00,01,...). Notice that i < fc. 

Let p be the program that on input d first 
prints X[i-i], subsequently prints d, and finally prints 
X[n-iogk+2:n]- Noticc that this defines a total func- 
tion. Moreover, only the information in xri-^i is 
needed to evaluate this function, since the last part 
of the output can be computed from i. Hence, we can 
construct p such that \p\ — i + 0(1) < k + 0(1). Fur- 



thermore, if we choose d 



'^[i+l-.n-logk+l] 



we have 



U{p,d) =xand \p\ + \d\ < (i + O {!)) + {n-i -log k) < 
n- logfc + 0(1). D 



Lemma 8. For some constant c, for all large d and 
for all n there exist an string x of length n such that 

C{x) >n-d, 

depth''J'_2iogd-c{x) >n-d. 

Proof. Assume that n > d, since otherwise the lemma 
would be trivial. We choose x to be the lexicograph- 
ically first string of length n which is incompressible 
in time BB(n — d), i.e. no program strictly shorter 
than n computes x in time BB{n — d). To show the 
inequalities of the lemma, it suffices to show that 

n-d< C{x) < n-d + 2\ogd + 0{l). 

For the right inequality, notice that we can compute x 
from BB{n — d) and n. Furthermore, with 0(1) bits 
of information, n can be computed from d and the 
length of a witnessing program for BB{n — d) (notice 
that a program witnessing BB{n — d) has length n — 
d + 0{1)). Hence x has a program of length n — d + 
2logd + 0{l). 

For the left inequality, notice that for large d we 
conclude that C{x) < n. By choice of x, any program 
producing x of length at most n — 1 must do it in time 
longer than BB{n — d), and by definition oi BB{n—d) 
this program must be strictly longer than n — d. D 

Proof of Theorem Let c' be the constant in the 
significance of Lemma [T] For any large k we apply 
Lemma [8] with d = log fc — c' to obtain a string x of 
complexity C{x) > \x\ — log fc + c'. Let us apply this 
bound to Lemma [T] the significance of the sophisti- 
cation becomes |a;| — {\x\ — log k — c') — log k + c' ~ 
and we conclude that sophQ(x) < fc < 0(2''). 

At the same time x satisfies depth^_2ioe: d-d^) — 
\x\ — d. Hence by substituting I = d — 2 \ogd — c the 
equations of the theorem are satisfied. 

Since k can be any large number, also d and I can 
be any large number. D 

5 Sophistication is unstable 

In [Tj the authors conjecture that Koppel's definition 
of sophistication might not be stable, in the sense that 
small changes in the constant c of the significance 
level could drastically change the value of soph^(a;). 
To avoid this potential problem, they incorporated 
the significance level as a penalty in the formula try- 
ing to obtain a more robust measure, called coarse so- 
phistication. However, one could argue that this mea- 
sure is not robust in the sense that drastic changes 



can happen for slight changes of the weight of the 
penalty function [26!. 

In this section, we prove that with small changes in 
the significance level c of sophistication its value can 
change almost maximally, answering affirmatively to 
the conjecture in [Ij discussed above. 

From the results mentioned in Section [2] we know 
that all sophistication measures are unstable if the 
significance is increased by 0(log |a;|). Moreover, the 
theorem below can be proven for probabilistic and 
set sophistication when defined in terms of prefix- 
free complexity with the same proof technique as |27[ 
Theorem IV. 4]. However, for plain Kolmogorov com- 
plexity this is not longer true and hence it could be 
the case that these sophistication measures are more 
stable. In the next theorem we show that this is not 
the case. 

Theorem 9. For all large c there are infinitely many 
X such tha\^ 

soph^{x) - sop/i^+o(i„g^)(x) > ||a;|. 

Recall that in our notation, the implicit O(-) con- 
stants do not depend on c. 

Proof. It is sufficient to show that for all fc,c there 
is a string x of length n = k + log fc + 2 such that 
soph^_o(i)(a;) > fc and soph^+o(iogc)(2;) < k/8 + 
0(1). 

Our construction of x implies that C{x) < k — c + 
0(log c). Hence, applying Lemma[7]with fc ^^ fc/8 im- 
plies that soph^_|_Q/iog^-,(a;) < fc/8-|-0(l) [the signifi- 
cance is (fc + \ogk + 2)- (fc - c + 0(log c)) - log(fc/8) + 
0(1) = c + 0(logc)]. Furthermore, soph^_Q(j)(a;) > 
fc — c if we consider x such that C{x) > k ~ c and 
such that there is no pair {p, d) satisfying the follow- 
ing properties 

i) U{p,d) ~ X and \p\ + \d\ < fc, 

ii) \p\ < k — c and U{p,y) is defined for all y such 
that \p\ + \y\ < fc. 

Now we describe how we construct such x. We keep 
a list of all strings of length n, which received marks. 
At each time, the lexicographic first string that has 
no mark is our candidate. The marks are given as 
follows: we dovetail all programs, and if a program 
of length less than fc — c halts with an output of 
length n, then that output is marked. Clearly, there 



^ For any e > 0, we can replace the term ^\x\ by (1 — e)\x\ 
if the significance of the second sophistication term is replaced 
by c + log 1/e + 0(log c). 
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are at most 2^^"^ different strings that are marked 
in this way. Moreover, if a program p is found sat- 
isfying condition [n]) , then all corresponding strings 
U{p, y) of length n are simultaneously marked. Thus 
these marks appear in less than 2^~^ different mo- 
ments, and the amount of such marks is less than 
X;*Lo2*2'=-* < (/c 4- 1)2'=. Hence, the total amount 
of marks is strictly less than (fc -I- 1)2'=+^ < 2" which 
means that there is always a candidate for x and at 
some point a candidate x is selected that will not 
anymore be replaced. By construction, C{x) > k — c 
and there is no pair (p, d) for which both conditions 
HI) andliil) are satisfied. 

Now we have to prove that C{x) < k — c + 0{\ogc). 
Since there are less than 2'="'^ -I- 2'="'^ moments where 
new marks are given, the candidate to x is replaced 
less than 2'="'^+^ times and hence C{x\k,c) < k — 
c + 0(1). In fact, X can be computed from the total 
number N of replacements. Hence from c and from N 
represented in binary with k — c bits we can compute 
X. {k can be computed from c and the length of N 
in binary.) Thus C{x\c) < k — c + 0(1) and hence 
C{x)<k-c + 0(logc). D 

We remark that with essentially the same argument 
this result holds for set sophistication defined with 
plain complexity. 
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